Acta Psychiatrica Scandinavica — Latest Matching Preprints

1

Suicide prediction with natural language processing of electronic health records

Korda, A.

2023-09-29 psychiatry and clinical psychology 10.1101/2023.09.28.23296268 medRxiv

Top 0.1%

9.5%

Show abstract

Suicide attempts are one of the most challenging psychiatric outcomes and have great importance in clinical practice. However, they remain difficult to detect in a standardised way to assist prevention because assessment is mostly qualitative and often subjective. As digital documentation is increasingly used in the medical field, Electronic Health Records (EHRs) have become a source of information that can be used for prevention purposes, containing codified data, structured data, and unstructured free text. This study aims to provide a quantitative approach to suicidality detection using EHRs, employing natural language processing techniques in combination with deep learning artificial intelligence methods to create an algorithm intended for use with medical documentation in German. Using psychiatric medical files from in-patient psychiatric hospitalisations between 2013 and 2021, free text reports will be transformed into structured embeddings using a German trained adaptation of Word2Vec, followed by a Long-Short Term Memory (LSTM) - Convolutional Neural Network (CNN) approach on sentences of interest. Text outside the sentences of interest will be analysed as context using a fixed size ordinally-forgetting encoding (FOFE) before combining these findings with the LSTM-CNN results in order to label suicide related content. This study will offer promising ways for automated early detection of suicide attempts and therefore holds opportunities for mental health care.

2

Pathway and network embedding methods for prioritizing psychiatric drugs

Pershad, Y.; Guo, M.; Altman, R. B.

2019-08-13 bioinformatics 10.1101/728055 medRxiv

Top 0.1%

8.3%

Show abstract

One in five Americans experience mental illness, and roughly 75% of psychiatric prescriptions do not successfully treat the patients condition. Extensive evidence implicates genetic factors and signaling disruption in the pathophysiology of these diseases. Changes in transcription often underlie this molecular pathway dysregulation; individual patient transcriptional data can improve the efficacy of diagnosis and treatment. Recent large-scale genomic studies have uncovered shared genetic modules across multiple psychiatric disorders--providing an opportunity for an integrated multi-disease approach for diagnosis. Moreover, network-based models informed by gene expression can represent pathological biological mechanisms and suggest new genes for diagnosis and treatment. Here, we use patient gene expression data from multiple studies to classify psychiatric diseases, integrate knowledge from expert-curated databases and publicly available experimental data to create augmented disease-specific gene sets, and use these to recommend disease-relevant drugs. From Gene Expression Omnibus, we extract expression data from 145 cases of schizophrenia, 82 cases of bipolar disorder, 190 cases of major depressive disorder, and 307 shared controls. We use pathway-based approaches to predict psychiatric disease diagnosis with a random forest model (78% accuracy) and derive important features to augment available drug and disease signatures. Using protein-protein-interaction networks and embedding-based methods, we build a pipeline to prioritize treatments for psychiatric diseases that achieves a 3.4-fold improvement over a background model. Thus, we demonstrate that gene-expression-derived pathway features can diagnose psychiatric diseases and that molecular insights derived from this classification task can inform treatment prioritization for psychiatric diseases.

3

Multimodal Data Integration Advances Longitudinal Prediction of the Naturalistic Course of Depression and Reveals a Multimodal Signature of Disease Chronicity

Habets, P. C.; Thomas, R. M.; Milaneschi, Y.; Jansen, R.; Pool, R.; Peyrot, W. J.; Penninx, B. W.; Meijer, O. C.; van Wingen, G. A.; Vinkers, C. H.

2023-01-11 genomics 10.1101/2023.01.10.523383 medRxiv

Top 0.1%

8.0%

Show abstract

The ability to individually predict disease course of major depressive disorder (MDD) is essential for optimal treatment planning. Here, we use a data-driven machine learning approach to assess the predictive value of different sets of biological data (whole-blood proteomics, lipid-metabolomics, transcriptomics, genetics), both separately and added to clinical baseline variables, for the longitudinal prediction of 2-year MDD chronicity (defined as presence of MDD diagnosis after 2 years) at the individual subject level. Prediction models were trained and cross-validated in a sample of 643 patients with current MDD (2-year chronicity n = 318) and subsequently tested for performance in 161 MDD individuals (2-year chronicity n = 79). Proteomics data showed best unimodal data predictions (AUROC = 0.68). Adding proteomic to clinical data at baseline significantly improved 2-year MDD chronicity predictions (AUROC = 0.63 vs AUROC = 0.78, p = 0.013), while the addition of other -omics data to clinical data did not yield significantly increased model performance. SHAP and enrichment analysis revealed proteomic analytes involved in inflammatory response and lipid metabolism, with fibrinogen levels showing the highest variable importance, followed by symptom severity. Machine learning models outperformed psychiatrists ability to predict two-year chronicity (balanced accuracy = 71% vs 55%). This study showed the added predictive value of combining proteomic, but not other -omic data, with clinical data. Adding other -omic data to proteomics did not further improve predictions. Our results reveal a novel multimodal signature of MDD chronicity that shows clinical potential for individual MDD disease course predictions from baseline measurements.

4

Prognostic predictions in psychosis: exploring the complementary role of machine learning models

van Dee, V.; Kia, S. M. M.; Fregosi, C.; Swildens, W. E.; Alkema, A.; Batalla, A.; van den Berg, C.; Coric, D.; van Dellen, E.; Dijkstra, L. G.; van den Doel, A.; Dominicus, L. S.; Enterman, J.; Gerritse, F.; van der Horst, M. Z.; van Houwelingen, F.; Koch, C. S.; Koomen, L. E. M.; Kromkamp, M.; Lancee, M.; Mouthaan, B. E.; van Rappard, D. F.; Regeer, E. J.; Salet, R. W. J.; Somers, M.; Straalman, J.; de Vette, M. H. T.; Voogt, J.; Winter - van Rossum, I.; Kahn, R. S.; Cahn, W.; Schnack, H. G.

2025-02-02 psychiatry and clinical psychology 10.1101/2025.01.30.25321382 medRxiv

Top 0.1%

7.8%

Show abstract

BACKGROUNDPredicting outcomes in schizophrenia spectrum disorders is challenging due to the variability of individual trajectories. While machine learning (ML) shows promise in outcome prediction, is has not yet been integrated into clinical practice. Understanding how ML models (MLMs) can complement psychiatrists predictions and bridge the gap between MLM capabilities and practical use is key. OBJECTIVEThis study aims to compare the performance of psychiatrists and MLMs in predicting short-term symptomatic and functional remission in patients with first-episode psychosis and explore whether MLMs can improve psychiatrists prognostic accuracy. METHODTwenty-four psychiatrists predicted symptomatic and functional remission probabilities based on written baseline information from 66 patients in the OPTiMiSE trial. ML-generated predictions were then shared with psychiatrists, allowing them to adjust their estimates. A questionnaire assessed trust in MLMs, perceived information gaps, and psychiatrists self-assessed predictive accuracy, which was compared to actual accuracy. FINDINGSThe predictive accuracy of the MLM was comparable to that of psychiatrists for symptomatic remission (MLM: 0.50, psychiatrists: 0.52) and functional remission (MLM: 0.72, psychiatrists: 0.79). Interrater agreement was low but comparable for psychiatrists and the MLM. Although the MLM did not improve overall predictive accuracy, it showed potential in aiding psychiatrists with difficult-to-predict cases. However, psychiatrists struggled to recognize when to rely on the models output and we were unable to determine a clear pattern in these cases based on their characteristics. Psychiatrists could not reliably estimate their predictive accuracy. Psychiatrists expressed moderate to high trust in MLMs for prognostic prediction, but highlighted concerns about the lack of transparency and interpretability of model outputs. CONCLUSIONSMLMs are a promising tool for supporting psychiatric decision-making, particularly in challenging cases. However, their potential remains underutilized due to limitations in predictive accuracy and a lack of clarity in how predictions are generated. Addressing these issues is essential to build trust and foster integration into clinical practice. CLINICAL IMPLICATIONSMLMs are best suited as supplementary tools, providing a second opinion while psychiatrists retain decision-making autonomy. Integrating predictions from both sources may help reduce individual biases and improve accuracy. This approach leverages the strengths of MLMs without compromising clinical responsibility. SUMMARY BOXO_ST_ABSWhat is already known on this topicC_ST_ABSWhile machine learning models (MLMs) show promise in predicting outcomes in psychotic disorders, they have yet to be integrated into clinical practice. Evidence on the predictive accuracy of psychiatrists for these disorders is limited, with only two small studies published before 1990 suggesting moderate accuracy. Comparisons of MLMs and psychiatrists in this context have not been previously conducted. What this study addsThis is the first study to compare the predictive accuracy of psychiatrists with that of an MLM for psychotic disorders and to assess whether an MLM can enhance psychiatrists performance. It highlights that while MLMs do not improve overall accuracy, they may support psychiatrists in difficult cases. Insights into psychiatrists trust in MLMs and the challenges of implementing these models are also provided. How this study might affect research, practice, or policyThe findings emphasize the need for advancements in MLM accuracy, interpretability, and strategies to identify cases where MLMs are most beneficial. These improvements could foster effective integration of MLMs as supplementary tools in clinical practice, aiding psychiatrists in decision-making while maintaining their autonomy.

5

Detection of Suicidality Through Privacy-Preserving Large Language Models

Wiest, I. C.; Verhees, F. G.; Ferber, D.; Zhu, J.; Bauer, M.; Lewitzka, U.; Pfennig, A.; Mikolas, P.; Kather, J. N.

2024-03-08 psychiatry and clinical psychology 10.1101/2024.03.06.24303763 medRxiv

Top 0.1%

7.7%

Show abstract

ImportanceAttempts to use Artificial Intelligence (AI) in psychiatric disorders show moderate success, high-lighting the potential of incorporating information from clinical assessments to improve the models. The study focuses on using Large Language Models (LLMs) to manage unstructured medical text, particularly for suicide risk detection in psychiatric care. ObjectiveThe study aims to extract information about suicidality status from the admission notes of electronic health records (EHR) using privacy-sensitive, locally hosted LLMs, specifically evaluating the efficacy of Llama-2 models. Main Outcomes and MeasuresThe study compares the performance of several variants of the open source LLM Llama-2 in extracting suicidality status from psychiatric reports against a ground truth defined by human experts, assessing accuracy, sensitivity, specificity, and F1 score across different prompting strategies. ResultsA German fine-tuned Llama-2 model showed the highest accuracy (87.5%), sensitivity (83%) and specificity (91.8%) in identifying suicidality, with significant improvements in sensitivity and specificity across various prompt designs. Conclusions and RelevanceThe study demonstrates the capability of LLMs, particularly Llama-2, in accurately extracting the information on suicidality from psychiatric records while preserving data-privacy. This suggests their application in surveillance systems for psychiatric emergencies and improving the clinical management of suicidality by improving systematic quality control and research. Key PointsO_ST_ABSQuestionC_ST_ABSCan large language models (LLMs) accurately extract information on suicidality from electronic health records (EHR)? FindingsIn this analysis of 100 psychiatric admission notes using Llama-2 models, the German fine-tuned model (Emgerman) demonstrated the highest accuracy (87.5%), sensitivity (83%) and specificity (91.8%) in identifying suicidality, indicating the models effectiveness in on-site processing of clinical documentation for suicide risk detection. MeaningThe study highlights the effectiveness of LLMs, particularly Llama-2, in accurately extracting the information on suicidality from psychiatric records, while preserving data privacy. It recommends further evaluating these models to integrate them into clinical management systems to improve detection of psychiatric emergencies and enhance systematic quality control and research in mental health care.

6

Dynamical instability measured by temporal entropy improves psychiatric classification across cohorts

Shoji, T.; Nakaki, R.

2026-04-30 bioinformatics 10.64898/2026.04.28.721265 medRxiv

Top 0.1%

7.0%

Show abstract

Psychiatric disorders, such as attention-deficit/hyperactivity disorder, autism spectrum disorder, and schizophrenia, are clinically heterogeneous and lack objective biomarkers for reliable diagnosis. Although blood transcriptomic data have been proposed as a potential source of diagnostic information, their generalizability across independent cohorts remains unclear. This study aimed to assess whether biologically informed measures of dynamic instability enhance the reproducibility and generalizability of psychiatric classifications based on peripheral blood data by integrating publicly available blood transcriptomic datasets from multiple cohorts and evaluating classification performance using individual-level cross-validation and study-level holdout validation. To investigate the underlying biological structure, we applied a dynamic systems framework, including pseudotime-based vector field inference and attractor analysis. Additionally, we introduced temporal entropy as a measure of dynamic instability in the inferred transcriptomic trajectories. High classification performance was observed in individual-level cross-validation (area under the receiver operating characteristic [AUROC] > 0.8 across several comparisons); however, performance decreased substantially in study-level validation (AUROC {approx} 0.5-0.7), indicating limited generalizability. Attractor analysis revealed that transcriptomic states formed continuous and overlapping structures rather than distinct diagnostic clusters. Stratification based on temporal entropy identified a subset of individuals with unstable transcriptomic dynamics, and excluding these individuals improved the classification performance across most diagnostic pairs (AUROC > 0.7). These findings suggest that transcriptomic variability and dynamic instability contribute to the limited reproducibility of psychiatric classifications. Incorporating temporal entropy as a measure of system-level instability may enhance the robustness and interpretability of biomarker-based models and provide a new perspective on psychiatric disorders as dynamic systems.

7

Identifying Psychosis Episodes in Psychiatric Admission Notes via Rule-based Methods, Machine Learning, and Pre-Trained Language Models

Hua, Y.; Blackley, S. V.; Shinn, A. K.; Skinner, J. P.; Moran, L. V.; Zhou, L.

2024-03-19 psychiatry and clinical psychology 10.1101/2024.03.18.24304475 medRxiv

Top 0.1%

6.7%

Show abstract

Early and accurate diagnosis is crucial for effective treatment and improved outcomes, yet identifying psychotic episodes presents significant challenges due to its complex nature and the varied presentation of symptoms among individuals. One of the primary difficulties lies in the underreporting and underdiagnosis of psychosis, compounded by the stigma surrounding mental health and the individuals often diminished insight into their condition. Existing efforts leveraging Electronic Health Records (EHRs) to retrospectively identify psychosis typically rely on structured data, such as medical codes and patient demographics, which frequently lack essential information. Addressing these challenges, our study leverages Natural Language Processing (NLP) algorithms to analyze psychiatric admission notes for the diagnosis of psychosis, providing a detailed evaluation of rule-based algorithms, machine learning models, and pre-trained language models. Additionally, the study investigates the effectiveness of employing keywords to streamline extensive note data before training and evaluating the models. Analyzing 4,617 initial psychiatric admission notes (1,196 cases of psychosis versus 3,433 controls) from 2005 to 2019, we discovered that the XGBoost classifier employing Term Frequency-Inverse Document Frequency (TF-IDF) features derived from notes pre-selected by expert-curated keywords, attained the highest performance with an F1 score of 0.8881 (AUROC [95% CI]: 0.9725 [0.9717, 0.9733]). BlueBERT demonstrated comparable efficacy an F1 score of 0.8841 (AUROC [95% CI]: 0.97 [0.9580, 0.9820]) on the same set of notes. Both models markedly outperformed traditional International Classification of Diseases (ICD) code-based detection methods from discharge summaries, which had an F1 score of 0.7608, thus improving the margin by 0.12. Furthermore, our findings indicate that keyword pre-selection markedly enhances the performance of both machine learning and pre-trained language models. This study illustrates the potential of NLP techniques to improve psychosis detection within admission notes and aims to serve as a foundational reference for future research on applying NLP for psychosis identification in EHR notes.

8

Cross-site predictions of readmission after psychiatric hospitalization with mood or psychotic disorders

Ren, B.; Yoon, W.; Thomas, S.; Savova, G.; Miller, T.; Hall, M.-H.

2024-08-26 psychiatry and clinical psychology 10.1101/2024.08.26.24312586 medRxiv

Top 0.1%

6.5%

Show abstract

Patients with mood or psychotic disorders have high rates of unplanned readmission, and predicting readmission likelihood may guide discharge decisions. In this retrospective, multi-site study, we assess the predictive power of various structured variables from electronic health records for all-cause readmission in each site separately and evaluate the generalizability of the in-site prediction models across sites. We find that the set of relevant predictors vary significantly across. For example, length of stay is strongly predictive of readmission at only three out of the four sites. We also find a general lack of cross-site generalizability of the in-site prediction models, with in-site predictions having an average F1 score of 0.666, compared to an average F1 score of 0.551 for cross-site predictions. The generalizability cannot be improved even after adjusting for differences in the distributions of predictors. These results indicate that, with this set of predictors, fitting individual models at each site is necessary to achieve reasonable prediction accuracy. Additionally, they suggest that more sophisticated predictors variables or predictive algorithms are needed to develop generalizable models capable of extracting robust insights into the root causes of early psychiatric readmissions.

9

Phenotyping Antidepressant Treatment Response with Deep Learning in Electronic Health Records

Sheu, Y.-h.; Magdamo, C.; Miller, M.; Das, S.; Blacker, D.; Smoller, J. W.

2021-08-04 psychiatry and clinical psychology 10.1101/2021.08.04.21261512 medRxiv

Top 0.1%

6.1%

Show abstract

Efficient, accurate phenotyping for antidepressant treatment response in electronic health records (EHRs) could facilitate precision psychiatry applications but remains a challenge. Increasingly, artificial intelligence methods using "deep learning" applied to clinical data have shown promise in complex classification problems. Here, we systematically evaluate the performance of eight deep-learning-based natural language processing models in classifying response to antidepressants in a large real-world healthcare setting. We obtained data spanning 1990-2018 for adults with depression and a co-occurring antidepressant prescription from the EHR data warehouse of the Mass General Brigham healthcare system (n=111,572). Clinical notes were collected for the following time windows after antidepressant initiation: (1) 2 days to 4 weeks, (2) 4-12 weeks, and (3) 12-26 weeks. A stratified random sample of these note sets (total 4,299 across time periods) were manually reviewed to classify response status as "improved" or "no evidence of improvement" in depression symptoms. All models performed well, with areas under the receiver operator curve (AUROC) of at least 0.80. Positive predictive values (PPVs) ranged from 0.72 - 0.91. In general, models incorporating more information-dense and longer text sequences performed better than others. The best performing model (Longformer-large with sliding window) had an AUROC = 0.88 and PPV = 0.84 at a specificity of 0.88. Our results indicate that deep learning methods applied to EHR data can accurately classify antidepressant response in a real-world healthcare setting. Automated treatment response classification may facilitate a range of research and clinical decision support applications.

10

Machine Learning Models for the Prediction of Early-Onset Bipolar Using Electronic Health Records

Wang, B.; Sheu, Y.-h.; Lee, H.; Mealer, R. G.; Castro, V. M.; Smoller, J. W.

2024-02-21 psychiatry and clinical psychology 10.1101/2024.02.19.24302919 medRxiv

Top 0.1%

6.0%

Show abstract

ObjectiveEarly identification of bipolar disorder (BD) provides an important opportunity for timely intervention. In this study, we aimed to develop machine learning models using large-scale electronic health record (EHR) data including clinical notes for predicting early-onset BD. MethodStructured and unstructured data were extracted from the longitudinal EHR of the Mass General Brigham health system. We defined three cohorts aged 10 - 25 years: (1) the full youth cohort (N=300,398); (2) a sub-cohort defined by having a mental health visit (N=105,461); (3) a sub-cohort defined by having a diagnosis of mood disorder or ADHD (N=35,213). By adopting a prospective landmark modeling approach that aligns with clinical practice, we developed and validated a range of machine learning models including neural network-based models, across different cohorts and prediction windows. ResultsWe found the two tree-based models, Random forests (RF) and light gradient-boosting machine (LGBM), achieving good discriminative performance across different clinical settings (area under the receiver operating characteristic curve 0.76-0.88 for RF and 0.74-0.89 for LGBM). In addition, we showed comparable performance can be achieved with a greatly reduced set of features, demonstrating computational efficiency can be attained without significant compromise of model accuracy. ConclusionGood discriminative performance for early-onset BD is achieved utilizing large-scale EHR data. Our study offers a scalable and accurate method for identifying youth at risk for BD that could help inform clinical decision making and facilitate early intervention. Future work includes evaluating the portability of our approach to other healthcare systems and exploring considerations regarding possible implementation.

11

A longitudinal study of depressive symptom trajectories and risk factors in congestive heart failure

Gallucci, J.; Ng, J.; Secara, M. T.; Jones, B. D. M.; Hawco, C.; Husain, M. O.; Husain, N.; Chaudhry, I. B.; Voineskos, A. N.; Husain, M. I.

2024-09-17 cardiovascular medicine 10.1101/2024.09.16.24313783 medRxiv

Top 0.1%

5.6%

Show abstract

BackgroundDepression is prevalent among patients with congestive heart failure (CHF) and is associated with increased mortality and healthcare utilization. However, most research has focused on high-income countries, leaving a gap in knowledge regarding the relationship between depression and CHF in low-to-middle-income countries (LMICs). This study aimed to delineate depressive symptom trajectories and identify potential risk factors for poor outcomes among CHF patients. MethodsLongitudinal data from 783 patients with CHF from public hospitals in Karachi, Pakistan was analyzed. Depressive symptom severity was assessed using the Beck Depression Inventory (BDI). Baseline and 6-month follow-up BDI scores were clustered through Gaussian Mixture Modeling to identify distinct depressive symptom subgroups and extract trajectory labels. Further, a random forest algorithm was utilized to determine baseline demographic, clinical, and behavioral predictors for each trajectory. ResultsFour depressive symptom trajectories were identified: good prognosis, remitting course, clinical worsening, and persistent course. Risk factors associated with persistent depressive symptoms included lower quality of life and the New York Heart Association (NYHA) class 3 classification of CHF. Protective factors linked to a good prognosis included less disability and a non-NYHA class 3 classification of CHF. ConclusionsBy identifying key characteristics of patients at heightened risk of depression, clinicians can be aware of risk factors and better identify patients who may need greater monitoring and appropriate follow-up care. Clinical Perspective What is new?O_LITo the best of our knowledge, this is the first study to use machine learning techniques to investigate depressive symptom trajectories in CHF patients from an LMIC. C_LIO_LIFour distinct depressive symptom trajectories were identified, ranging from good prognosis to persistent depressive symptoms. C_LIO_LIThis study highlights protective and risk factors associated with these trajectories based on patients demographics and clinical presentations at baseline. C_LI What are the clinical implications?O_LIPersonalized interventions based on identified protective factors for high-risk CHF patients could enhance both mental health and cardiovascular outcomes. C_LIO_LIEarly detection and management of depression, particularly in patients with poor quality of life or advanced heart failure, may help reduce healthcare utilization and mortality. C_LIO_LIThis study emphasizes the importance of routine depression screening in CHF patients, especially in LMICs, to enhance overall patient care and outcomes. C_LI

12

Electroconvulsive Therapy during the COVID-19 Pandemic: Nationwide Data from Denmark

Reinecke-Tellefsen, C. J.; Orberg, A.; Ostergaard, S. D.

2026-02-17 psychiatry and clinical psychology 10.64898/2026.02.13.26346228 medRxiv

Top 0.1%

5.5%

Show abstract

The COVID-19 pandemic had substantial impact on healthcare systems across the globe, including psychiatric services. Use of electroconvulsive therapy (ECT), a lifesaving intervention for severe mental illness, was reported to have declined during the pandemic in several countries, but nationwide data remain scarce. Using nationwide data from the Danish National Patient Register, we examined all ECT treatments administered in Denmark from September 2019 to May 2025. Weekly treatment numbers were visualized across the three national COVID-19 lockdowns to descriptively assess changes in ECT use. A notable reduction in ECT treatments was observed in the weeks preceding and during the first lockdown (March 11 to May 18, 2020). A post-hoc estimation indicated approximately 1,366 "missed" treatments during the initial pandemic phase in 2020. When these were added to the 27,033 treatments delivered in 2020, the adjusted total approximated annual treatment volumes in 2019 and 2022, suggesting a temporary disruption rather than sustained decline. In contrast, ECT activity during the second and third lockdowns appeared largely unaffected. These findings suggest that ECT provision in Denmark was temporarily reduced during the initial phase of the pandemic but remained resilient thereafter. In the case of a future pandemic, safeguarding timely access to ECT--particularly in early phases-- should be prioritized given its critical role in the treatment of severe mental illness.

13

Generalizability of Risk Models for Treatment-Resistant Depression Across Three Health Systems

Walsh, C. G.; Ripperger, M.; McCoy, T. H.; Castro, V.; Hu, Y.; Kirchner, L.; Ruderfer, D. M.; Perlis, R. H.

2025-05-27 psychiatry and clinical psychology 10.1101/2025.05.21.25328089 medRxiv

Top 0.1%

5.3%

Show abstract

BackgroundAs multiple strategies have emerged for managing treatment-resistant major depressive disorder, efficient identification of individuals at elevated risk for this outcome earlier in their illness course remains essential. MethodWe extracted electronic health records data for all individuals with a diagnosis of major depressive disorder who received an index antidepressant prescription in the clinical networks of three geographically-distinct health systems - Mass General-Brigham (MGB), Vanderbilt University Medical Center (VUMC), and Geisinger Clinic (GC) - between April 1, 2004, and March 30, 2022. The primary outcome, treatment resistant depression, was defined as provision of electroconvulsive therapy, transcranial magnetic stimulation, vagus nerve stimulation, prescription of either ketamine or esketamine or monoamine oxidase inhibitors (MAOIs), or failed trials of more than two antidepressants. We applied L1-regularized regression to sociodemographic features, medications, and ICD10 diagnostic code counts to fit a model of treatment resistance in each of the three cohorts. For each, we then estimated generalizable model performance, aka external validity, across the other two cohorts. Model concordance was measured with Concordance Correlation Coefficients (CCCs) and random forest regression analyses were used to estimate importance of features predicting discordance. ResultsAcross sites, discrimination performance ranged from Area Under the Receiver Operating Characteristic curves (AUROCs) 0.58 - 0.64 on internal validation and 0.51 - 0.58 on external validation. Area Under the Precision-Recall curve (AUPRC) ranged from 0.1-0.13 on internal validation and averaged 0.07-0.13 in external validation on the same test sets held out at each site. On the same testing set, CCCs were 0.13 for the VUMC<-> MGB models, 0.18 for VUMC<->GC models, and 0.38 for MGB<-> GC models. These results indicate the MGB and GC models were better correlated, but none were well correlated. Important features predicting discordance were dominated primarily by age and secondarily coded sex. ConclusionThese linear models demonstrated consistent aggregate performance and discordant individual performance across three, disparate major health systems. The inclusion of large and heterogeneous samples suggest that further improvement may require incorporation of data types beyond those readily available in EHR. Close attention to performance by key subgroups is indicated to ensure models do not perform disparately or unfairly. Prospective studies to evaluate the extent to which clinical models might improve early identification and outcomes are warranted.

14

Uncovering the linguistic characteristics of psychotherapy: a computational approach to measure therapist language timing, responsiveness, and consistency

Miner, A. S.; Fleming, S. L.; Haque, A.; Fries, J.; Althoff, T.; Wilfley, D. E.; Agras, W. S.; Milstein, A.; Hancock, J.; Ash, S. M.; Stirman, S. W.; Arnow, B. A.; Shah, N. H.

2022-04-27 psychiatry and clinical psychology 10.1101/2022.04.24.22274227 medRxiv

Top 0.1%

5.3%

Show abstract

Although individual psychotherapy is generally effective for a range of mental health conditions, little is known about the moment-to-moment language use of effective therapists. Increased access to computational power, coupled with a rise in computer-mediated communication (telehealth), makes feasible the large-scale analyses of language use during psychotherapy. Transparent methodological approaches are lacking, however. Here we present novel methods to increase the efficiency of efforts to examine language use in psychotherapy. We evaluate three important aspects of therapist language use - timing, responsiveness, and consistency - across five clinically relevant language domains: pronouns, time orientation, emotional polarity, therapist tactics, and paralinguistic style. We find therapist language is dynamic within sessions, responds to patient language, and relates to patient symptom diagnosis but not symptom severity. Our results demonstrate that analyzing therapist language at scale is feasible and may help answer longstanding questions about specific behaviors of effective therapists.

15

Generalizability of Clinical Prediction Models in Mental Health - Real-World Validation of Machine Learning Models for Depressive Symptom Prediction

Richter, M.; Emden, D.; Leenings, R.; Winter, N. R.; Mikolajczyk, R.; Massag, J.; Zwiky, E.; Borgers, T.; Redlich, R.; Koutsouleris, N.; Falguera, R.; Edwin Thanarajah, S.; Padberg, F.; Reinhard, M. A.; Back, M. D.; Morina, N.; Buhlmann, U.; Kircher, T.; Dannlowski, U.; FOR2107 consortium, ; PRONIA consortium, ; MBB consortium, ; Hahn, T.; Opel, N.

2024-04-05 psychiatry and clinical psychology 10.1101/2024.04.04.24305250 medRxiv

Top 0.1%

5.3%

Show abstract

Mental health research faces the challenge of developing machine learning models for clinical decision support. Concerns about the generalizability of such models to real-world populations due to sampling effects and disparities in available data sources are rising. We examined whether harmonized, structured collection of clinical data and stringent measures against overfitting can facilitate the generalization of machine learning models for predicting depressive symptoms across diverse real-world inpatient and outpatient samples. Despite systematic differences between samples, a sparse machine learning model trained on clinical information exhibited strong generalization across diverse real-world samples. These findings highlight the crucial role of standardized routine data collection, grounded in unified ontologies, in the development of generalizable machine learning models in mental health. One-Sentence SummaryGeneralization of sparse machine learning models trained on clinical data is possible for depressive symptom prediction.

16

Predicting Intentional Self-Harm Following Psychiatric Discharge in Catalonia, Spain: Machine Learning Models from Linked Registry Data

Alayo, I.; Pujol, O.; Amigo, F.; Ballester, L.; Cirici Amell, R.; Contaldo, S. F.; Ferrer, M.; Guinart, D.; Latorre, L.; Leis, A.; Lopez Fernandez, M.; Mayer, M. A.; Pastor, M.; Pena-Salazar, C.; Portillo-Van Diest, A.; Ramirez-Anguita, J. M.; Sanz, F.; Alonso, J.; Kessler, R. C.; Mehlum, L.; Palao, D.; Perez Sola, V.; Vilagut, G.; Mortier, P.

2025-09-28 psychiatry and clinical psychology 10.1101/2025.09.26.25336360 medRxiv

Top 0.1%

5.0%

Show abstract

IntroductionPatients recently discharged from psychiatric hospitalization are at increased risk of intentional self-harm, including suicide. Using linked population-based registry data from Catalonia, Spain, we developed machine learning-based prediction models for post-discharge intentional self-harm across different follow-up horizons, sex, and age groups, and evaluated their generalizability and robustness with multiple validation strategies. MethodsRetrospective cohort study including 41,827 individuals accounting for 71,865 psychiatric hospitalizations with discharge at age [≥]10 years, between January 1, 2015, and December 31, 2018, in Catalonia, Spain, with follow-up until December 31, 2019. Primary outcome was intentional self-harm (fatal or non-fatal) within 7, 30, 90, 180, and 365 days post-discharge. Models incorporated 247 predictors from electronic health records, including sociodemographic characteristics, mental and physical disorder categories, categories of dispensed psychotropic medication, and history of self-harm and psychiatric hospitalization. Model performance was evaluated using the area under the receiver operating characteristic curve (AUCROC) and the area under the precision-recall curve (AUCPR). Predictor importance was assessed using Shapley Additive Explanations (SHAP). ResultsWithin 365 days, 4,901 hospitalizations (6.8%) were followed by intentional self-harm. The 365-day model trained on the full cohort achieved a AUCROC of 0.819, in the test sample with adjusted AUCPR indicating a median 5.4-fold improvement over baseline prevalence. This model generalized well across event horizons and sex-age strata, outperforming subgroup-specific models when data sparsity limited performance. Separate models trained by event horizons, and stratified by sex, and sex-age groups achieved a median AUCROC of 0.775 (IQR 0.764-0.808), with adjusted AUCPR indicating a median 5.4-fold improvement over baseline prevalence (IQR 4.5-6.2). Key predictors included the recency of the last registered diagnosis of depressive episodes, recurrent depression, adjustment disorders, and schizophrenia, as well as recent SSRI dispensation and the number of childhood-onset disorder and musculoskeletal disease diagnoses in the previous five years. Predictor importance varied considerably across sex-age strata, with smaller differences across horizons. Subject-level and temporal split validation strategies reduced performance (AUCROC 0.711-0.746), though estimates remained clinically informative (2.8-3.1-fold improvement over baseline prevalence). ConclusionsMachine learning models using routinely collected health records predicted intentional self-harm after psychiatric hospitalization with good discrimination and clinically meaningful precision-recall performance. A single 365-day model generalized well across horizons and demographic groups, suggesting that one broadly trained model may provide a pragmatic and scalable approach for clinical implementation.

17

Youth Psychotic Experiences: Diagnostic Associations and Evaluation of the CAPE-16

Birkenas, V.; Parekh, P.; Hegemann, L.; Bakken, N. R.; Frei, E.; Jaholkowski, P.; Smeland, O. B.; Susser, E.; Rodriguez, K.; Tesfaye, M.; Andreassen, O.; Havdahl, A.; Soenderby, I. E.

2024-04-18 psychiatry and clinical psychology 10.1101/2024.04.18.24306017 medRxiv

Top 0.1%

4.9%

Show abstract

BackgroundAdolescent self-reported psychotic experiences are associated with mental illness and could help guide prevention strategies. The Community Assessment of Psychic Experiences (CAPE) was developed over 20 years ago. In a rapidly changing society, where new generations of adolescents are growing up in an increasingly digital world, it is crucial to ensure high reliability and validity of the questionnaire. MethodsIn this observational validation study, we used unique transgenerational questionnaire and health registry data from the Norwegian Mother, Father, and Child Cohort, a population-based pregnancy cohort. Adolescents, aged [~]14 years, responded to the CAPE-16 (n = 18,835) and fathers to the CAPE-9 questionnaire (n = 28,793). We investigated the psychometric properties of CAPE-16 through factor analyses, measurement invariance testing across biological sex, response before/during the COVID-19 pandemic, and generations (comparison with fathers), and examined associations with later psychiatric diagnoses. OutcomesOne third (33{middle dot}4%) of adolescents reported lifetime psychotic experiences. We confirmed a three-factor structure (paranoia, bizarre thoughts, and hallucinations) of CAPE-16, and observed good scale reliability of the distress and frequency subscales ({omega} = {middle dot}86 and {middle dot}90). CAPE-16 measured psychotic experiences were invariant to biological sex and pandemic status. CAPE-9 was non-invariant across generations, with items related to understanding of the digital world (electrical influences) prone to bias. CAPE-16 sum scores were associated with a subsequent psychiatric diagnosis, particularly psychotic disorders (frequency: OR = 2{middle dot}06; 97{middle dot}5% CI = 1{middle dot}70-2{middle dot}46; distress: OR = 1{middle dot}93; 97{middle dot}5% CI = 1{middle dot}63-2{middle dot}26). InterpretationCAPE-16 showed robust psychometric properties across sex and pandemic status, and sum scores were associated with subsequent psychiatric diagnoses, particularly psychotic disorders. These findings suggest that with certain adjustments, CAPE-16 could have value as a screening tool for adolescents in the modern, digital world. FundingEuropean Unions Horizon 2020 Programme, Research Council of Norway, South-Eastern Norway Regional Health Authority, NIMH, and the KG Jebsen Stiftelsen.

18

Uncovering novel therapeutics for schizophrenia: a multitarget approach using the CANDO platform

Hu, Y.; Samudrala, R.; Falls, Z.

2026-02-05 bioinformatics 10.1101/2025.10.24.684257 medRxiv

Top 0.1%

4.9%

Show abstract

Schizophrenia is a complex and debilitating neuropsychiatric disorder characterized by positive, negative, and cognitive symptoms, many remaining insufficiently addressed by current treatments. High rates of treatment resistance and heterogeneous pathophysiology pose significant challenges to traditional single-target drug discovery. To address this, we applied the CANDO platform to identify repurposable drugs for schizophrenia using a multitarget strategy. The platform evaluates how compounds interact with the entire human proteome, generating interaction signatures that capture a compounds effect across all targets. By comparing these signatures, CANDO computes compound similarity scores and enables consensus prediction of novel therapeutics. Across all indications and for schizophrenia in particular, CANDO outperformed controls by orders of magnitude across several benchmarking metrics, accurately recovering known drug-indication relationships. We applied our CANDO pipeline, benchmarked by its ability to recover approved drugs for their indications using similarity- and consensus-based metrics, to generate a ranked list of repurposable compounds. Our comprehensive literature review confirmed clinical or biochemical evidence supporting 25 high-corroboration drug candidates, including phenothiazine and benzamide antipsychotics, tricyclic antidepressants, benzodiazepines, and monoamine oxidase inhibitors. We identified protein targets with the highest likelihood of interaction with these top drugs and assessed prediction quality through overlap analysis with gold standards, showing significantly better concordance with established schizophrenia-related biology than controls from bottom-ranked and randomly selected drugs. Among these proteins are corroborated targets such as canonical neurotransmitter receptors of the dopamine, serotonin, and adrenergic classes, as well as monoamine transporters, tyrosine aminotransferase, and lysine-specific histone demethylase 1A. Enrichment of calcium-binding proteins among the top predicted targets for the phenothiazine antipsychotic thiethylperazine highlights a potential role for dysregulated calcium signaling, including calmodulin-CaMKK2 pathways, in schizophrenia pathology and treatment response. Another phenothiazine antipsychotic, triflupromazine, binds the serotonin transporter in addition to its canonical dopamine D2 receptor interaction, highlighting its potential relevance to depressive symptom modulation in schizophrenia. These findings demonstrate the utility and accuracy of the CANDO platform in elucidating multitarget pharmacological mechanisms and accelerating the identification of effective repurposable treatments for schizophrenia.

19

The VOICE-DEP study protocol: multimodal analysis of voice and discourse during medical interviews to support diagnosis and longitudinal monitoring of major depressive disorder

Zabalza-Zudaire, M.; Sayar-Beristain, O.; Fructos, P.; Nunez, F. E.; Carpio, F. F.; Garcia, E.; Ortiz, A.; Ortuno, F.; Aldaz, A.; Molero, P.

2026-07-04 psychiatry and clinical psychology 10.64898/2026.07.01.26357063 medRxiv

Top 0.1%

4.9%

Show abstract

Background: Major depressive disorder is a severe, recurrent and disabling condition. Although diagnosis and clinical monitoring are based on medical interviews and validated rating scales, speech and discourse analysis may provide complementary digital biomarkers reflecting depressive severity and clinical evolution. However, current evidence remains limited by methodological heterogeneity, predominantly cross-sectional designs, limited longitudinal data and underrepresentation of non-English-speaking clinical populations. Objective: The aim of the VOICE-DEP study is to develop and formalize a standardized, reproducible and clinically grounded protocol for the multimodal analysis of voice and discourse during medical interviews as a tool to support the diagnosis of depressive disorder and to assess whether speech-derived biomarkers change over time in parallel with clinical severity measures. Methods: VOICE-DEP is an observational, prospective, longitudinal pilot study of patients with major depressive disorder with a healthy control group, conducted in a hospital-based clinical setting in Spain. The study will include 25 adult patients with moderate or severe unipolar depression, with or without psychotic symptoms, and 50 healthy controls without a personal history of psychiatric disorders. Patients will be assessed at five time points: baseline (V0) and four monthly follow-up visits at 30, 60, 90 and 120 days. Healthy controls will be assessed once at baseline. The planned dataset comprises 175 voice recordings: 125 from patients and 50 from controls. At each assessment, the Montgomery-Asberg Depression Rating Scale related part of the medical interview, lasting approximately 10-30 minutes and including an initial free-speech segment, will be recorded using a standardized audio protocol. Acoustic, paralinguistic and linguistic features will be extracted and analyzed in relation to clinician-rated severity measures and self-reported symptoms. Ethics: This protocol has been reviewed and approved by the local Research Ethics Committee, which complies with the international standards of GCP CPMP/ICH/135/95 (Comunidad Foral de Navarra Research Ethics Committee; reference code: 2026.110). Written informed consent will be obtained from all participants before any study procedure. Voice recordings and clinical data will be pseudonymized, stored securely and processed in accordance with applicable Spanish and European data protection regulations. Expected outcomes: This protocol is expected to generate a clinically grounded Spanish-language longitudinal speech corpus and a transparent analytical framework for evaluating voice- and discourse-derived biomarkers as complementary tools for depression assessment and monitoring

20

Predictive Modelling of Service Pathways to Admission in Psychiatric Residential Treatment Facilities

Vsevolozhskaya, O. A.; Turner, B. W.; Schnimshock, S. M.; Harp, K.; Tong, X.; Lyons, J.

2022-07-22 psychiatry and clinical psychology 10.1101/2022.07.22.22277941 medRxiv

Top 0.1%

4.4%

Show abstract

ObjectiveTo develop and test predictive models of admissions to a psychiatric residential treatment facility (PRTF) in transitional age youth using routinely collected health insurance claims and enrollment data. Data SourcesWe used outpatient service and pharmaceutical claims from Medicaid beneficiaries aged 6-to 21-years old in Kentucky for the years 2010-2017. Study DesignWe assessed over 1,250 predictors (derived from Medicaid claims data) prior to the first PRTF admission. An ensemble machine learning (ML) algorithm based on logistic regression models fitted to a random subsample of the original data was used to predict pathways to the first PRTF admission. Discrimination performance of the ML ensemble was evaluated by comparing predictions to actual outcomes and calculating area under the curve (AUC), accuracy, sensitivity, and specificity. Additionally, a multivariate logistic regression model was fit to investigate the contribution of the continuity of mental health care after the initial PRTF admission on the risk of readmission. Data Collection/Extraction MethodsWe identified N = 519,011 unique children and youth with at least one outpatient service or pharmaceutical claim during our study period (January 1, 2010 through December 31, 2017). Principal FindingsFewer than 0.5% of children and youth in Kentucky had an episode of PRTF admission. Despite a very low prevalence of PRTF admission, classification accuracy of the ML ensemble for identifying PRTF youth achieved over 90% accuracy (AUC = 0.96). Factors associated with the initial PRTF admission were having been prescribed anti-psychotic and anti-manic medications, and receiving outpatient psychiatric care. Within six months after the initial PRTF discharge, there was a surprising drop in service utilization with a large proportion of youth not appearing to receive any follow-up care. ConclusionsDespite the fact that admission into a PRTF was a relatively rare event, our findings suggest that it is a predictable event among youth with identified mental health conditions who are receiving care in the community. What is known on this topicO_LIAfter psychiatric hospitalization, PRTF treatment is the most expensive and restrictive intervention available to serve children and youth. C_LIO_LIPrevious research examining predictors of PRTF entry using Medicaid reimbursement data showed that clinical factors were strong predictors of hospitalization. C_LI What this study addsO_LIWe provide a comprehensive analysis of the factors beyond clinical diagnoses that lead to PRTF entry. C_LIO_LIWe also seek to identify whether any specific patterns of service and/or pharmacy claims utilization are associated with reducing the likelihood of readmission. C_LI